An Empirical Comparison of Discretization Methods
نویسندگان
چکیده
Many machine learning and neurally inspired algorithms are limited, at least in their pure form, to working with nominal data. However, for many real-world problems, some provision must be made to support processing of continuously valued data. This paper presents empirical results obtained by using six different discretization methods as preprocessors to three different supervised learners on several real-world problems. No discretization technique clearly outperforms the others. Also, discretization as a preprocessing step is in many cases found to be inferior to direct handling of continuously valued data. These results suggest that machine learning algorithms should be designed to directly handle continuously valued data rather than relying on preprocessing or ad hoc techniques.
منابع مشابه
Error-Based and Entropy-Based Discretization of Continuous Features
We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based ...
متن کاملComparison of different empirical methods for estimating ddaily reference evapotranspiration in the humid cold climate (case study: Borujen, Shahrekord, Koohrang and Lordegan)
The proposed method for calculation of potential evapotranspiration is Penman-Monteith FAO method, but there are other methods that require less meteorological data but estimates close to the FAO Penman-Monteith method in different climatic conditions. Performance evaluation of these methods on the same basis is prerequisite for selecting an alternative approach in accordance with available da...
متن کاملAn Empirical Comparison between Grade of Membership and Principal Component Analysis
t is the purpose of this paper to contribute to the discussion initiated byWachter about the parallelism between principal component (PC) and atypological grade of membership (GoM) analysis. The author testedempirically the close relationship between both analysis in a lowdimensional framework comprising up to nine dichotomous variables and twotypologies. Our contribution to the subject is also...
متن کاملProposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method
Many classification algorithms are designed to work with datasets that contain only discrete attributes. Discretization is the process of converting the continuous attributes of the dataset into discrete ones in order to apply some classification algorithm. In this paper we first review previous work in discretization, then we propose a new discretization method based on a distance proposed by ...
متن کاملAn Evolutionary Multi-objective Discretization based on Normalized Cut
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...
متن کامل